Skip to content

Add RuleSetCustomizer SPI for multi-stage planner Calcite rules#18387

Open
gortiz wants to merge 8 commits into
apache:masterfrom
gortiz:broker-rule-customizer-spi
Open

Add RuleSetCustomizer SPI for multi-stage planner Calcite rules#18387
gortiz wants to merge 8 commits into
apache:masterfrom
gortiz:broker-rule-customizer-spi

Conversation

@gortiz
Copy link
Copy Markdown
Contributor

@gortiz gortiz commented Apr 30, 2026

Introduces a ServiceLoader-discovered SPI that lets plugins add, remove, reorder, or replace the Calcite HEP rules used by the multi-stage query engine. The OSS defaults are themselves contributed by DefaultRuleSetCustomizer (registered via META-INF/services), and plugin customizers run on top.

  • New Phase enum covers every HEP phase QueryEnvironment runs: BASIC, FILTER_PUSHDOWN, PROJECT_PUSHDOWN, PRUNE, POST_LOGICAL, POST_LOGICAL_V2, POST_LOGICAL_ENRICHED_JOIN. Append-only contract.
  • New RuleSetCustomizer interface — plugins implement customize(Phase, List<RelOptRule>) and modify the per-phase list in place.
  • New PinotRuleSet owns the per-phase, immutable rule lists and is the single source of truth read by QueryEnvironment. The default instance is built lazily from ServiceLoader discovery.
  • DefaultRuleSetCustomizer replaces the static PinotQueryRuleSets lists; the deleted class is the rename source.
  • QueryEnvironment reads every phase's rules through PinotRuleSet via a new Config#getRuleSet() @Value.Default. The per-query sortExchangeCopyLimit override stays inside getTraitProgram (per-query copy of POST_LOGICAL with all PinotSortExchangeCopyRule instances replaced by the override).

Per-query usePlannerRules / skipPlannerRules filtering is unchanged (still applied on top of the rule lists in getOptProgram).

Introduces a ServiceLoader-discovered SPI that lets plugins add, remove,
reorder, or replace the Calcite HEP rules used by the multi-stage query
engine. The OSS defaults are themselves contributed by
DefaultRuleSetCustomizer (registered via META-INF/services), and plugin
customizers run on top.

- New `Phase` enum covers every HEP phase QueryEnvironment runs:
  BASIC, FILTER_PUSHDOWN, PROJECT_PUSHDOWN, PRUNE, POST_LOGICAL,
  POST_LOGICAL_V2, POST_LOGICAL_ENRICHED_JOIN. Append-only contract.
- New `RuleSetCustomizer` interface — plugins implement
  `customize(Phase, List<RelOptRule>)` and modify the per-phase list
  in place.
- New `PinotRuleSet` owns the per-phase, immutable rule lists and is
  the single source of truth read by `QueryEnvironment`. The default
  instance is built lazily from `ServiceLoader` discovery.
- `DefaultRuleSetCustomizer` replaces the static `PinotQueryRuleSets`
  lists; the deleted class is the rename source.
- `QueryEnvironment` reads every phase's rules through `PinotRuleSet`
  via a new `Config#getRuleSet()` `@Value.Default`. The per-query
  `sortExchangeCopyLimit` override stays inside `getTraitProgram`
  (per-query copy of POST_LOGICAL with all `PinotSortExchangeCopyRule`
  instances replaced by the override).

Per-query `usePlannerRules` / `skipPlannerRules` filtering is unchanged
(still applied on top of the rule lists in `getOptProgram`).
@gortiz gortiz added extension-point Adds or modifies an extension/SPI point multi-stage Related to the multi-stage query engine labels Apr 30, 2026
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few high-signal SPI/plugin issues; see inline comments.

import org.apache.calcite.plan.RelOptRule;


/// Plugin SPI for customizing the multi-stage planner's Calcite rule sets.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documented as a plugin SPI, but Pinot plugin realms only import org.apache.pinot.spi by default. A standard plugin packaged with pinot-plugin.properties cannot even link org.apache.pinot.query.planner.rules.RuleSetCustomizer without extra realm-import wiring, so the advertised extension point is broken for normal plugins. Please move this SPI into pinot-spi or import this package by default before landing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shoudl be fixed now

/// Discovers every [RuleSetCustomizer] via [ServiceLoader] and builds a
/// rule set from them. Used by [#defaultInstance()] and by callers that
/// don't have an externally-managed customizer list.
public static PinotRuleSet loadFromServiceLoader() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if a plugin adds the extra class-realm import, this discovery path only does ServiceLoader.load(RuleSetCustomizer.class). Broker startup does not enumerate Pinot plugin classloaders here, and PluginClassLoader keeps plugin jars isolated, so third-party customizers will never be discovered and only DefaultRuleSetCustomizer will run. Please load providers from PluginManager/plugin realms explicitly instead of relying on the default ServiceLoader lookup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had another branch where I introduced a guice based mechanism to do the same that fixes this, but I'm not sure whether we want to use Guice or not, so I'm fixing it here

/// applied later by `QueryEnvironment.getTraitProgram`, which swaps the
/// configured `PinotSortExchangeCopyRule` on a per-query copy of the
/// `POST_LOGICAL` list.
public final class DefaultRuleSetCustomizer implements RuleSetCustomizer {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename removes the existing public org.apache.pinot.calcite.rel.rules.PinotQueryRuleSets type entirely. Any out-of-tree planner extension or test compiled against current releases will fail to load or compile after upgrade. Please keep a deprecated bridge in the old package instead of replacing the class outright.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fixed now

gortiz and others added 6 commits May 11, 2026 15:33
Three issues raised in PR review:

1. Restore deprecated PinotQueryRuleSets bridge. The original class
   org.apache.pinot.calcite.rel.rules.PinotQueryRuleSets had a public API
   (five static rule lists + getPinotPostRules). Deleting it was a backward-
   compat break for out-of-tree code. The bridge forwards to the canonical
   constants in DefaultRuleSetCustomizer; per-query sortExchangeCopyLimit
   overrides are now handled by QueryEnvironment.getTraitProgram.
   DefaultRuleSetCustomizer rule-list fields made public to support this.

2. Enumerate plugin classloaders in loadFromServiceLoader. ServiceLoader.load
   with the context classloader only finds application-classpath customizers;
   plugin JARs in isolated ClassRealm/PluginClassLoader realms are invisible.
   PinotRuleSet.loadFromServiceLoader now iterates PluginManager.getPlugin
   ClassLoaders() and does a per-classloader ServiceLoader.load. Duplicates
   (same class name discovered via both classloaders) are de-duped by class
   name. RuleSetCustomizer Javadoc explains the pinot-plugin.properties
   importFrom.pinot requirement for plugins implementing this interface.

3. Add getPluginClassLoaders() to PluginManager. Returns the set of all
   loaded plugin classloaders (old-style PluginClassLoader registry and
   new-style ClassRealm realms), excluding the DEFAULT slot. Used by
   loadFromServiceLoader for plugin discovery. Covered by two new tests in
   PluginManagerTest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase and RuleSetCustomizer (and their ServiceLoader registration) move from
pinot-query-planner to a new pinot-query-planner-spi module that only depends
on calcite-core. This lets plugins implement the broker rule-customization SPI
without pulling in the full query planner on their compile classpath.

PluginManager now exports org.apache.pinot.query.planner.rules and
org.apache.calcite.plan from the pinot realm into every plugin realm so
plugins can link RuleSetCustomizer and RelOptRule without bundling or shading
those packages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_LOGICAL_V2, add realm test

- Move META-INF/services from pinot-query-planner-spi to pinot-query-planner:
  the SPI jar should not register DefaultRuleSetCustomizer which lives in the
  implementation module.

- Rename Phase.POST_LOGICAL_V2 → Phase.POST_LOGICAL_PHYSICAL: the V2 suffix
  leaked an internal versioning convention into a permanent SPI enum name.
  The new name encodes the semantic (physical optimizer enabled). All call
  sites updated (DefaultRuleSetCustomizer, QueryEnvironment, PinotRuleSetTest,
  PinotQueryRuleSets bridge).

- Add PluginRealmExportTest in pinot-query-planner-spi: verifies that
  PluginManager exports org.apache.pinot.query.planner.rules and
  org.apache.calcite.plan from the pinotRealm so plugin classloaders can load
  RuleSetCustomizer and RelOptRule without bundling those packages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The package-info Maven plugin generates a package-info.java annotated with
@javax.annotation.ParametersAreNonnullByDefault for every module. Unlike
other modules, pinot-query-planner-spi has no transitive dependency that
pulls in javax.annotation-api, causing a compilation failure in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… delegates

Reverses the movement of rule lists out of PinotQueryRuleSets into
DefaultRuleSetCustomizer, reducing the diff of this PR. PinotQueryRuleSets
keeps BASIC_RULES, FILTER_PUSHDOWN_RULES, PROJECT_PUSHDOWN_RULES, PRUNE_RULES,
PINOT_POST_RULES_V2, and the new POST_LOGICAL_RULES (extracted from the old
getPinotPostRules). DefaultRuleSetCustomizer.customize() simply delegates to
those lists. A TODO comment notes the rules may be consolidated into
DefaultRuleSetCustomizer once the RuleSetCustomizer SPI is the established
extension point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ParametersAreNonnullByDefault (used in generated package-info.java) is from
com.google.code.findbugs:jsr305, not javax.annotation:javax.annotation-api.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's a number of CI failures that need to be addressed

* specific language governing permissions and limitations
* under the License.
*/
package org.apache.pinot.query.planner.rules;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be in a new pinot-query-planner-spi module but we're using the same non-spi package?

/// <p>The {@code sortExchangeCopyLimit} parameter is <b>no longer applied here</b>. Per-query
/// sort-exchange-copy threshold overrides are now handled by {@code QueryEnvironment.getTraitProgram},
/// which swaps the rule on a per-query copy of {@link #POST_LOGICAL_RULES}.
public static List<RelOptRule> getPinotPostRules(int sortExchangeCopyLimit) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sortExchangeCopyLimit parameter here is now unused?

…ginManager realm exports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.37209% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.85%. Comparing base (88b0d29) to head (d8f7c22).
⚠️ Report is 66 commits behind head on master.

Files with missing lines Patch % Lines
...apache/pinot/query/planner/rules/PinotRuleSet.java 76.66% 5 Missing and 2 partials ⚠️
.../query/planner/rules/DefaultRuleSetCustomizer.java 88.88% 1 Missing and 1 partial ⚠️
...he/pinot/calcite/rel/rules/PinotQueryRuleSets.java 50.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (88b0d29) and HEAD (d8f7c22). Click for more details.

HEAD has 16 uploads less than BASE
Flag BASE (88b0d29) HEAD (d8f7c22)
java-21 5 1
unittests 2 1
temurin 5 1
unittests2 1 0
integration 3 0
integration1 1 0
integration2 1 0
custom-integration1 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18387      +/-   ##
============================================
- Coverage     63.43%   55.85%   -7.59%     
+ Complexity     1698      817     -881     
============================================
  Files          3253     2558     -695     
  Lines        198970   148346   -50624     
  Branches      30813    23962    -6851     
============================================
- Hits         126221    82853   -43368     
+ Misses        62681    58415    -4266     
+ Partials      10068     7078    -2990     
Flag Coverage Δ
custom-integration1 ?
integration ?
integration1 ?
integration2 ?
java-21 55.85% <88.37%> (-7.59%) ⬇️
temurin 55.85% <88.37%> (-7.59%) ⬇️
unittests 55.85% <88.37%> (-7.59%) ⬇️
unittests1 55.85% <88.37%> (+0.48%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension-point Adds or modifies an extension/SPI point multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants